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REMARKS 



Claims 42-50, 67, 69-72, 74, 75, 80, 81, 88, 90-93, and 95-104 are pending in the 
application. Claims 42-44, 46-48, 67, 69, 70, 72, 74, 75, 80, 81, 88, 90-93, and 95-99 have been 
amended. Claims 42-45, 67, 69, 70, 80, 88, 90-93, and 95-104 have been withdrawn from 
consideration. Claims 1-41, 51-66, 68, 73, 76-79, 82-87, 89, and 94 have been cancelled without 
prejudice. These amendments add no new matter. 

Restriction/Election 

In response to the Restriction Requirement, Applicants elect the invention of Group III, 
drawn to nucleic acids encoding the LBP-2 polypeptide of SEQ ID NO:43 and variants and 
fragments thereof. The election is made with traverse. 

The Examiner divided the claims into nine separate groups. For the reasons provided 
below, applicants respectfully request that the claims of Groups II, VI, VII, and IX (partially) be 
examined together with the claims of Group III in the present application. 

The pending claims are directed to nucleic acids encoding a novel LDL-binding 
polypeptide ("LBP") termed LBP-2. Both human and rabbit LBP-2 sequences are recited in the 
claims. Groups II, III, VI, VII, and IX (partially) are directed to human LBP-2 sequences. 
Groups I, IV, V, VIII, and IX (partially) are directed to non-elected rabbit LBP-2 sequences. 

The human LBP-2 polypeptide is described in SEQ ID NO:43 (fiall length polypeptide) 
and SEQ ID N0:7 (amino acids 322-538 of human LBP-2). The human LBP-2 polypeptide of 
SEQ ID No:7 (Group II) is a fragment of the elected SEQ ID NO:43. SEQ ID NO: 16 (Group VI) 
is a specific nucleotide sequence encoding SEQ ID N0:7, and SEQ ID NO:45 (Group VII) is a 
specific nucleotide sequence encoding SEQ ID NO:43. SEQ ID NOS 30, 31, 32, and 33 
(Group IX) each encodes a polypeptide fragment of the elected SEQ ID NO:43. 

Because of the extremely high sequence relatedness between SEQ ID NO:43 and SEQ ID 
N0:7, applicants submit that prosecution will be facilitated by the simultaneous examination of 
nucleic acids encoding each of these human LBP-2 polypeptides. In addition, the issues raised 
during the course of prosecution of these human LBP-2 nucleic acid sequences are expected to 
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be similar and, therefore, simultaneous examination is not expected to be unduly burdensome. 
Applicants also note that in a parent application of the present application (U.S. Patent No. 
6,632,923), all of the human LBP-2 polypeptides were examined together with the respective 
variants and fragments. Applicants have cancelled claims directed to rabbit LBP-2 nucleic acids 
from the present application. 

In light of the above comments, applicants respectfully request that the Examiner 
examine the human LBP-2 nucleic acid sequences of Groups II, III, VI, VII, and IX. 

35 U.S.C. § 112. 1'^ Paragraph (Enablement) 

On pages 8-9 of the Office Action, the Examiner rejected claims 59-61, 72, 74, and 75 as 

allegedly "containing subject matter which was not described in the specification in such a way 

as to enable one skilled in the art to which it pertains, or with which it is most nearly cormected, 

to make and/or use the invention." In particular, the Examiner stated that 

the specification fails to describe or provide guidance about the nucleotide sequence that 
encodes a polypeptide comprising an amino acid sequence having identity to a fragment 
of at least 10 or 20 or 30 amino acid residues of the encoded polypeptide of SEQ ID NO: 
43 (claims 72, 74, 75). It is not clear to a skilled artisan that what is the position of these 
10, 20, and 30 amino acids in relation to the amino acid sequence set forth in SEQ ID 
NO: 43. Although Examples 2, 3, 4, 5 (pages 40-45) demonstrate the full-length cDNA 
encoding LDL binding protein, this is not demonstrative of any fragments or analogs that 
are claimed in claims 59-61 and claims 72, 74, and 75. For these reasons it would require 
undue experimentation to make the claimed invention. 

Claims 59-61 have been cancelled thereby rendering their objection moot. 

Applicants traverse the rejection of claims 72, 74, and 75 in light of the claim 
amendments and the following comments. 

As amended, claims 72, 74, and 75 are directed to an isolated nucleic acid comprising a 
nucleotide sequence that encodes a polypeptide comprising an amino acid sequence that binds to 
LDL and is identical to a fragment of at least 10, 20, or 30 amino acid residues of the human 
LBP-2 polypeptide of the SEQ ID NO:43. A person skilled in the biological arts knows how to 
generate nucleic acids encoding fragments of the LBP-2 polypeptide and how to test the ability 
of such polypeptide fragments to bind to LDL. For example, as detailed in the specification 
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(see, e.g., page 21), fragments of a polypeptide can be generated by removing one or more 
nucleotides from one end (for a terminal fragment) or both ends (for an internal fragment) of a 
nucleic acid that encodes a polypeptide. Nucleic acids that encode fragments of a polypeptide 
can also be generated by, e.g., random shearing, endonuclease restriction digestion, or a 
combination of any of these methods. Expression of such a recombinant DNA would produce the 
desired LBP-2 fragments. 

The specification instructs how to evaluate the ability of LBP-2 polypeptide fragments to 
bind to LDL by using methods such as affinity chromatography, affinity coelectrophoresis, or 
ELISA (see specification at page 21, line 2 to page 22, line 3). Consistent with the preceding 
comments, Example 8 indicates that a particular stretch of acidic amino acids of the human 
LBP-2 (about amino acids 329-354) participates in the binding of LBP-2 to LDL (page 49, 
lines 4-20). Examples 9 and 10 fiirther detail methods for determining whether LBP-2 
polypeptides bind to LDL in the presence of a given candidate inhibitor (page 49, line 22 to 
page 51, line 10). 

In light of the foregoing, a person of ordinary skill in the biological arts would have been 
able to make and use an isolated nucleic acid that encodes an LDL-binding fragment of LBP-2 
without undue experimentation and with a reasonable expectation of success. 

At pages 9-12 of the Office Action, the Examiner rejected claims 46-48, 59-61, 72, 73- 

75, 81, and 85-87 as allegedly not enabled. In particular, the Examiner stated that 

the specification, while being enabling for an isolated nucleic acid comprising a sequence 
that encodes a polypeptide of an amino acid sequence set forth in SEQ ID NO:43 that 
binds to low density lipoprotein (LDL); does not reasonably provide enablement for all 
the LDL binding proteins, and fragments and mutants generated from any position 
located on the sequence of SEQ ID NO:43. The specification does not enable any person 
skilled in the art to which it pertains, or with which it is most nearly connected, to make 
and/or use the invention commensurate in scope with these claims. The specification, 
however, only discloses cursory conclusions (see page 8-24) to support the findings. 



Claims 59-61, 73, and 85-87 have been cancelled thereby rendering their rejection moot. 
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Applicants respectfully traverse the rejection of claims 46-48, 72, 74-75, and 81 in light 
of the claim amendments and the comments provided below. 

The claims rejected herein are directed to variants or fragments of human LBP-2 that 
retain the ability to bind LDL. It is well within the grasp of the biologist of ordinary skill to 
prepare, for example, a polypeptide having at least 80%, at least 90%, at least 95%, or at least 
98% sequence identity to the human LBP-2 of SEQ ID NO:43. The specification details standard 
mutagenesis methods that can be used to make amino acid sequence variants (page 19, line 2 to 
page 20, line 2). Furthermore, the specification instructs, and the skilled biologist is well aware, 
that conservative amino acid substitutions can be made in the LBP-2 polypeptide sequence so as 
to reduce the likelihood that a given amino acid sequence will result in a loss of LBP-2 function 
(page 17, line 21 to page 18, line 15). In addition, fragments of the full-length LBP-2 polypeptide 
can be generated by using standard techniques that are detailed above and in the specification 
(e.g., page 21, line 2 to page 22, line 3). 

In addition to being able to readily produce nucleic acids encoding human LBP-2 
fragments or sequence variants, it would have required no undue experimentation for the skilled 
artisan to identify those variants that retain the specific LDL binding activity recited in the 
claims 46-48, 74-75, and 81. By using the assays described in the specification (e.g., page 47, 
line 23 to page 49, line 20), the skilled artisan would have been able to determine, without undue 
experimentation and with a reasonable expectation of success, whether a given human LBP-2 
fragment or sequence variant binds to LDL. 

At page 1 1 of the Office Action, the Examiner stated that the "specification has provided 
no guidance to enable one of ordinary skill in the art to determine, without undue 
experimentation, the positions in the protein, which are tolerant to change (e.g., by amino acid 
deletions, insertions, or substitutions) and the nature and extent of changes that can be made in 
these positions." 

Although it is possible in certain cases to abolish the functional activity of a protein by 
mutating a critical amino acid residue, this does not mean that one of ordinary skill cannot 
nonetheless readily make functional analogs of a given protein (e.g., LBP-2) without undue 
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experimentation. In fact, as detailed in the enclosed publication of Bowie et al. (1990) Science 
247:1306-10 ("Exhibit A"), "proteins are surprisingly tolerant of amino acid substitutions." 
Exhibit A cites as evidence of this assertion a study carried out on the lac repressor that found 
that of approximately 1500 single amino acid substitutions at 142 positions in the protein, "about 
one-half of all substitutions were phenotypically silent." Thus, one can expect, based on 
Exhibit A's disclosure, that a significant percentage of random substitutions in a given protein 
will result in mutated proteins with full or nearly full activity. These are far better odds than 
those at issue in In re Wands, 858 F.2d 731 (Fed. Cir. 1988), cited by the Examiner on page 9, in 
which the court found that screening many hybridomas to find the few that fell within the claims 
was not undue experimentation. The question is not whether it is possible to abolish activity of a 
given protein by introducing a point mutation, but rather whether one of ordinary skill can 
produce, without undue experimentation, mutants in which the activity is not abolished. 

Based on Exhibit A's disclosure, one would predict that even random substitution of 
amino acid residues of a hxmian LBP-2 polypeptide would result in a large pool of mutants 
having the LDL binding activity recited in the claims. Furthermore, as detailed herein, the 
specification amply teaches the skilled artisan how to select those mutants having the activity 
required by the claims. In light of these comments, Applicants submit that one of ordinary skill 
in the art would have been able, at the filing of the present application, to make and use the 
claimed nucleic acids without undue experimentation. Accordingly, Applicants request that the 
Examiner withdraw the rejection. 

35 U.S.C § 1 12. 2"^ Paragraph (Indefmiteness) 

Claims 46-48 and 59-61 were rejected as allegedly indefinite in their use of the term 
"identical." Claims 59-61 have been cancelled thereby rendering their rejection moot. For 
claims 46-48, Applicants have adopted the Examiner's suggested language by directing the 
claims to an isolated nucleic acid comprising a nucleotide sequence that encodes a polypeptide 
comprising an amino acid sequence and has at least 80%, 90%, or 95% "sequence identity" to 
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the sequence of SEQ ID No:43. In light of these amendments, Apphcants request that the 
Examiner withdraw the rejection. 

The Examiner rejected independent claims 46, 59, 73, 81, and 85 as allegedly indefinite 
because of the use of the term "LDL" in the absence of the fully spelled out phrase "low density 
lipoprotein." Claim 42 has been amended to provide the fully spelled out term, which is followed 
by the acronym throughout the remainder of the claims. In light of these amendments. Applicants 
request that the Examiner withdraw the rejection. 

35 U.S.C. 102(e) (Anticipation) 

The Examiner rejected claims 72, 73, and 85 as allegedly anticipated by Colasanti et al., 

U.S. Patent No. 6,177,614 ("Colasanti"). According to the Examiner, 

Colasanti 's peptide is considered for the encoded peptide sequence fragment of at least 
10 amino acid residues of SEQ ID NO:43 (claims 72, 85). Colasanti 's peptide having the 
structure of the claimed encoded peptide of instant application considered anticipating the 
LDL binding of the claimed peptide (73). Therefore, claims 72, 73, and 85 of the instant 
application are being anticipated by Colasanti et al. 

Claims 73 and 85 have been cancelled, rendering the objection moot for these claims. 
Applicants traverse the rejection of claim 72 in light of the claim amendments and the 
following comments. 

Claim 72, as amended, is directed to an isolated nucleic acid comprising a nucleotide 
sequence that encodes a polypeptide comprising an amino acid sequence that binds to LDL and 
is identical to a fragment of at least ten amino acids of SEQ ID NO:43. 

Colasanti discloses the Id gene in maize plants, which is similar to that of genes encoding 
zinc-finger regulatory proteins in animals. However, nothing in Colasanti suggests that the Id 
gene encodes a protein that binds LDL. 

"The fact that a certain result or characteristic may occur or be present in the prior art is 
not sufficient to establish the inherency of that result or characteristic" (MPEP § 21 12, citing In 
re Riickaert , 9 F.3d 1531, 1534 (Fed. Cir. 1993)) (emphasis in original). To rely on inherency, 
"the examiner must provide a basis in fact and/or technical reasoning to reasonably support the 
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determination that the allegedly inherent characteristic necessarily flows from the teachings of 
the applied prior art." (MPEP § 21 12, citing Ex parte Lew . 17 USPQ2d 1461, 1464 (Bd. Pat. 
App. & Inter. 1990)) (emphasis in original). "Inherency, however, may not be established by 
probabilities or possibilities." (MPEP § 21 12, citing In re Robertson . 169 F.3d 743, 745 (Fed, 
Cir. 1999)). 

There is no evidence of record that would lead the skilled artisan to conclude that a 
peptide disclosed by Colasanti binds to LDL. Because there is no basis in fact or technical 
reasoning to reasonably conclude that Colasanti discloses an isolated nucleic acid sequence that 
encodes a polypeptide comprising an amino acid sequence that is identical to a fragment of at 
least 10 amino acids of SEQ ED No:43 and also binds to LDL, the reference does not anticipate 
claim 72. Applicants request that the Examiner withdraw the rejection. 



Applicants submit that all grounds for rejection have been overcome, and that all claims 
are now in condition for allowance. 

Enclosed is a Petition for Two Month Extension of Time and a check for the Petition for 
Extension of Time fee. Please apply any other charges or credits to deposit account 06-1050, 
referencing Attorney Docket No. 10797-004002. 



Fish & Richardson P.C. 
45 Rockefeller Plaza, Suite 2800 
New York, New York 10111 
Telephone: (212)765-5070 
Facsimile: (212)258-2291 



CONCLUSIONS 



Respectfully submitted, 




Jack Brennan 
Reg. No. 47,443 



30195887.doc 



Exhibit A 



])edplhieiriinig the Message m Frotein Seqmeinicess 
Toleraimce to Ammo Add Simbstittiitdoias 

James U. Bowie,-^ John F. Reidhaar-Olson, Wendell A. Lim, 

Robert T. Sauer 



add sequence cmcodes a message thait detes:- 
3CS the shape and fimcriom'of a protein, TQiis message is 
jiity degeneratte in itha^ many diiSeirentt sequences can 
le &}T prateiinis witlh essendaUy the same stmcftufe and 
ivity. Comparisoim of different sequences widi similar 
ssages can reveal key features of the code and improve 
Icrstanding of how a protein folds and how it per- 
ms its fisnddon. 



rH£ GENOME IS MANIFEST LARGELY IN THE SET OF PRO* 
tcins chat it encodes. It is the ability of these proteins to fold 
into unique three-dimensional structures that allows them to 
ndon and carry out the instructions of the genome. Thus, 
iprehending the rules that relate amino acid sequence to struc- 
: is fundamental to an understanding of biological processes, 
ause an amino acid sequence contains all of the informadon 
:ssary to determine the structure of a protein (7), it should be 
siblc to predict structure from sequence, and subscquendy to 
r detailed aspects of function firom die structure. However, both 
blems are extremely complex, and it seems unlikely that cirficr 
be solved in an exaa manner in die near future. It may be 
sible to obtain approximate solutions by usmg experimental data 
impiify the problem. In this article, we describe how an analysis 
dlowed amino add substitutions in proteins can be used to 
jce the complexity of sequences and reveal important aspects of 
cture and fiinction. 



jthods fosr Studymg ToleiraunLce to 




here are two main approaches to studying the tolerance of an 
no acid sequence to change. The first method relies on the 
:css of evolution, in which mutations are either accepted or 
Md by natural selection. . This method has been extremely 
/crful for proteins such as the globins or cytochromes, for which 
icnces from many different species are known (2-7). The second 
roach uses genetic methods to introduce amino acid changes at 
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specific positions in a cloned gene and uses selections or screens to 
identify functional sequences. This approach has been used to great 
advantage for proteins that can be expressed in bacteria or yeast, 
where the appropriate generic manipulations are possible (5, 8-11). 
The end results of both methods are lists of active sequences that can 
be compared and analyzed to identify sequence features that are 
essential for folding or function. If a particular property of a side 
chain, such as charge or size, is important at a given position, only 
side chains diat have the required property will be allowed. Con- 
versely, if the chemical identity of the side chain is unimportant, 
then many different substitutions will be permitted. 

Studies in which these methods were used have revealed that 
proteins are surprisingly tolerant of amino add substitutions {2-4, 
11). For example, in studying the effects of approximately 1500 
single amino add substitutions at 142 positions in lac repressor, 
Miller and co- workers found that about one-half of all substitutions 
were phenotypically silent (11). At some positions, many dificrent, 
nonconservative substitutions were allowed. Such residue positions 
play litdc or no role in structure and function. At other positions, no 
substitutions or only conservative substitutions were allowed. These 
residues are the most important for /^c repressor activity. 

What roles do invariant and conserved side chains play in 
proteins? Residues that are direcdy involved in protein functions 
such as binding or catalysis will certainly be among the most 
conserved. For example, replacing the Asp in the catalytic triad of 
trypsin with Asn results in a 10^-fold reduction in activity {12), A 
similar loss of activity occurs in X repressor when a DNA binding 
residue is changed from Asn to Asp (13). To carry out their 
function, however, these catalytic residues and binding residues 
must be precisely oriented in three dimensions. Consequently, 
mutations in residues that arc required for structure formation or 
stability can also have dramatic effects on activity (10, 14-16). 
Hence, many of the residues that are conserved in sets of related 
sequences play structural roles. 



Smbstitatiom at Surface aimd Bimed Fosidom 

In their initial comparisons of the globin sequences, Perutz and 
co-workers found that most buried residues require nonpolar side 
chains, whereas few features of surfece side chains are generally 
conserved (^. Similar results have been seen for a liumbcr of protein 
fomilies (2, 4, 5, 7, 17, IS). An example of the sequence tolerance at 
surface venus buried sites can be seen in Fig. 1, virhich shows the 
allowed substitutions in \ repressor at residue positions that are near 
the dimer interface but distant from the DNA binding surface of the 
protein {9), These substitutions were identified by a functional 
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P^- 1. (A) Amino acid substitutions allovrad in a 
*ort r^n of X repressor. The wild-type te- 
qurace b shovm along die center line, ifc al- 
lowed subsntunons shown above each position 
w«^tified by randomly mutating^me to 

^ »V ,using a cassetS medicd 
and applying a fiincnonal selcxaion (5). (0) The 
faa.«jal solvent accessibility (42) of die wild- 
side chain m die protein dimer <4J) relative 
to the same atoms in an Ala-X-Ala modd tripep- 
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arc buried in Ae protein. In contr^sTmost of th?hS 
«^ posmpns tderate a wide nmge of chemically dUfaLiS 
dmns mdudmg hydrophilic and hydrophobic residue it 
scans diat most of the structund information indl^^^Z 
pmtcu, is earned by the residue that are s^lJ^tScSSJe ' 

Comsteaktts om Core Sequiemces 
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16 MARCH 1990 




phylpgenetic studies, where it has been noted that the si« decreases 
and increases at mteracting residues are not necesS^St^ 
«m^compleme„^fe3hion (5, 7. 17). ^^t^^,!,' 

slTSiT ^ conformational chaliges in °eZ 

side chains and by a variety of backbone movements ^ 

The Infoirmational Impommce of Ae Core 

Wid. occasional exceptions, die core must remain hydroohobic 
and mamtain a reasonable packing density Hov^sK?? 
« composed of side chains diat caS oX hStXum^. 

sttric clashes. How unportant are hydrophobidty, volume. «»d 
stenccomplemoitarityindeternJiningiheLagiv^ 



Hg- 2. Amino add substitu- 
tions allowed in die core of X 
repressor. The wild-type side 
chains arc shown pictorially in 
the approximate oricntadon 
seen ui the crystal structure 
(<?). The lists of allowed sub- 
stitutions at each position arc 
snown below die wild-type 
side chains. ITiese substitu- 
tions v/cK identified by ran- 
domly mutating one to four 
residues at a time by using a 
cassette mcrfiod and applymg 
a functional selection (20). 
Not all substitutions are al- 
lowed in every sequence back- 
ground. 
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the appropriate hydrophobic residues, a significant fraction were 
accepQblc. Hence, the hydrophobidty of a sequence contains 
more infonmaQon about its potential acceptability in the core than 
docs the total side chain volume. Steric compatibility was intermedi- 
ate between volume and hydrophobidty in informational impor- 
tance. ™ 



Implications for Structure Fredictiom 



At present, die only reliable mediod for predicting a low- 
rtsolunon ternary structure of a new protcm is by idcntifvine 
scquaice suniiarity to a protein whose structure is already knowS 
{29, 30). However, it is often dilEcult to align sequences as the level 
of sequence suiularity decreases, and it is sometimes impossible to 
mi,^ TT r • . ^ " statisncally significant sequence similarity between distantiv 

ifte IMormational Importance of Surface Sites ''^'""^ P~«^- because die number of known sequences is S 

' 8'*""^"'*=">™beroffaiown structures, it would be advanta- 

geous to mcrease die read, of die available structural information by 
improvmg methods for detecting distant sequence relations and for 

subsequently ahgmng diese sequence based on structural prindples 
In a normal homology seardi, die sequence database is scanned wirfi 
a smgle test sequence, and every residue must be wdghted equaUy 
However, some residues are more important tiian odi^ and lould 
be weighted accordmgly. Moreover, certain regions of die proton 
more hkdy to contain gaps dian odiers. Bodi kinds of iilforma- 
tion can be obtamed from sequence sets, and several tedmiques have 



We have noted diat many surface sites can tolerate a wide variety 
of side diarns including hydrophilic and hydrophobic residues This 
r^ult might be taken to indicate diat surface positions contain little 
structural mformation. However, Bashford et «/., in an extensive 
analysis of globm sequences (4), found a strong bias against large 
hydrophobic residues at many surftce positions. At onf level, ^s 

f K TTu ^ P'°'*='" solubility, because large 

patdies of hydrophobic surface residues would presumably lead to 
aggreganon. At a more fundamental level, protein folding requires a 
partitionmg between surface and buried positions. ConsMuaidv to 
achieve a umque native state widiout significant competition from 
ITa^ " """y ^ '^Potv^t that some sites have a 

deaded preference for exterior rather than interior positions. As a 
r«dt,nany surface sites can accept hydrophobic residues individ- 
uaUy, but die surface as a whole can probably tolerate only a 
moderate number of hydrophobic side diains. 

Idendficatioo of Residue Roles from 
Sets of Sequences. 

Often, a protein of interest is a member of a famUy of related 
jquences. What can we infer from die pattern of allowed substiS 
oons at posmons m sets of aligned sequences generated by genetic 
or phylogenetic mediods? Residue positions tiiat can accepH 
number of different side diains, indud^g diarged and higiy^laJ 
residues, are almost certain to be on die protein surfece Residue 
x.s.aons that remain hydrophobic, whether variable or noT are 

SiJ^L ""^ ^ ^'g- 3' -idu" 

tel Tn ""'"P' hydrophilic side diains are 

hoMm ui orange and diose diat cannot accept hydrophilic side 

l^c^c^o^ZS: '^^ f"^"^ hydrophobic'posirions 

vTo ?T u*' "^"^"^ ^^"^ P0"0<»« that ouTaccept 
ydrophilic side diains define die surfece 

Functionally irnportant residues should be conserved in sets of 
riTT""' "°'P°"*'= ^hedier a side diain 

^served To make dus disnnction requires an independent assay of 
^t«n foldmg. The ability of a mutant protein to maintain a Jbty 
Ided structure can often be measured by biophysical techniques 

^tt "'"^^^ (^7, 2S). In die latter 

«ZlnJ^ P™'""" Sec of sequences 

« allow formanon of a stable structure can dien be compared to 
j^Jts diat allow bodi folding and fimction, witf, die S st o^ 

S h i "" '^'^ ^ ^'^""^ *e set of stable 
''^ "^^"/"f the set of fiincrional proteins, nie DNA- 

t?en?^ ^ "^od («). 

n^Sr S'"^ hormone were also 

nafied by cornparmg die stabilities and activities of a set of 

^tSt^f'^- " ^ mutants^e^ 

mones widi different bmding spccifidties. 
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bcen used to combine such infomudon into more appropriately 
weighted sequence searches and alignments (31), These methods 
were used to align die sequences of retroviral proteases with asparric 
proteases, which in turn allowed construction of a three-dimension- 
al model for the protease of human immunodeficiency virus type 1 
(29), Coniparison with the recently determined crystal structure of 
this protein revealed reasonable agreement in many areas of the 
prediaed structure (32). 

The structural infomiarion at most surface sites is highly degener- 
ate. Except for functionally important residues, exterior positions 
seem to be important chiefly in maintaining a reasonably polar 
surfece. The information contained in buried residues is also 
degenerate, the main requirement being that these residues remain 
hydrophobic. Thus, at its most basic level, the key structural 
message in an amino add sequence may reside in its specific pattern 
of hydrophobic and hydrophilic residues. This is meant in an 
informational sense. Clearly, die precise structure and stability of a 
protein depends on a large number of detailed interactions. It is 
possible, however, diat structural prediction at a more primitive 
level can be accomplished by concentrating on the most basic 
informational aspects of an amino add sequence. For example, 
amphipaduc patterns can be extracted from aligned sets of sequences 
and used, in some cases, xo identify secondary structures. 

If a region of secondary structure is packed against die hydropho- 
bic core, a pattern of hydrophobic residues reflecting die pcriodidty 
of die secondary strucnirc is cxpeaed (33, 34), These patterns can be 
obscured in individual sequences by hydrophobic residues on the 
protein surface. It is rare, however, for a surface position to remain 
hydrophobic over the course of evolution. Consequendy, die am- 
phipathic patterns expcaed for simple secondary structures can be 
much dearer in a set of related sequences (d). This prindple is 
illustrated in Fig. 4, which shows helical hydrophobic moment plots 
for the Antcnnapedia homcodomain sequence (Fig. 4A) and for a 
composite sequence derived from a set of homologous homeodo- 
main proteins (Fig. 4B) (35). The hydrophobic moment is a simple 
measure of die degree of amphipathic character of a sequence in a 
given secondary structure (34). The amphipadiic charaaer of die 
three a-helical regions in the Anteimapedia protein (36) is dearly 
revealed only by die analysis of die combined set of homeodomain 
sequences. The secondary structure of Arc repressor, a small DNA- 
binding protein, was recendy predicted by a similar mediod (8) and 
confirmed by nudear magnetic resonance studies (37). 

The spcdfic pattern of hydrophobic and hydrophUic residues in 
an amino add sequence must limit die number of different structures 
a given sequence can adopt and may indeed define its overall fold. If 
diis is true, dicn die anangcment of hydrophobic and hydrophilic 
residues should be a charaacristic feature of a particular fold Sweet 
and Eiscnberg have shown diat die correlation of die pattern of 
hydrophobidty between two protein sequences is a good criterion 
for dieir structural rdatedncss (38). In addition, several studies 
indicate diat patterns of obligatory hydrophobic positions identified 
from aligned sequences are distinctive features of sequences diat 
adopt die same sttucnire (4, 29, 38, 39). Thus, die order of 
hydrophobic and hydrophilic residues in a sequence may actually be 
suffident information to determine the basic folding pattern of a 
protein sequence. 

Aldiough die pattern of sequence hydrophobidty may be a 
characteristic feanirc of a particular fold, it is not yet clear how such 
patterns could be used for prediction of struourc de novo. It is 
important to understand how patterns in sequence space can be 
related to structures in conformation space. Lau and Dill have 
approached diis problem by studying die properties of simple 
sequences composed only of H (hydrophobic) and P (polar) groups 
on two-dimensional lattices (40). An example of such a reprcsenta- 



don is shown in Fig. 5. Residues adjacent in die sequence must 
occupy adjacent squares on the lattice, and two residues cannot 
occupy die same space. Free energies of particular conformations are 
evaluated widi a single term, an attraction of H groups By 
considenng chains of ten residues, an exhaustive confonmdond 
search for all 1024 possible sequences of H and P residues was 
possible. For longer sequences only a representative fraction of the 
allowed sequence or conformation space could be explored The 
significant results were as follows: (i) not all sequences can fold into 
a "nauve" structure and only a few sequences form a unique native 
structure; (ii) die probability diat a sequence will adopt a unique 
nadve strucnire increases with chain lengdi; and (iii) die native 
states are compact, contain a hydrophobic core surrounded by polar 
residues, and contain significant secondary sttucttire. Aldiough the 
gap between dicsc two-dimensional simulations and dirce-dimen- 
sional soucnires is large, die use of simple rules and sequence 
representations yields results similar to diose expcaed for real 
protdns. Three-dimensional lattice mediods are also beginning to 
be developed and evaluated (41). 



Siiiimmaify 

There is more information in a set of rdated sequences dian in a 
single sequence. A number of practical applications arise from an 
analysis of die tolerance of residue positions to change. First, such 
information permits die evaluation of a residue's importance to die 
function and stability of a protein. This ability to identify die 
essential dements of a protein sequence may improve our under- 
standing of die determinants of protein folding and stability as well 
as protein function. Second, patterns of tolerance to amino acid 
subsntutions of varying hydrophilicity can help to identify residues 
iikdy to be buried in a protein stnicnire and diose likely to occupy 



Rg. ^. Helical hydro- 
phobic moments calcu- 
lated by using (A) the 
Antcnnapedia homeodo- 
main sequence or (B) a 
set of 39 aligned homco- 
domain sequences (J5). 
The bars indicate the ex- 
tent of the helical re- 
gions identified in nucle- 
ar magnetic resonance 
studies of the Antcnna- 
pedia homeodomain 
(36). To determine hy- 
drophobic moments, 
residues were assigned 
to one of three groups: 
HI (high hydrophobid- 
ly = Trp, Dc, Phe, Leu, 
Met, Val, or Cys); H2 
(medium hydrophobic- 
ity = Tyr,Pro,yUa,1hr, 
Gin, Asn, Glu, Asp, Lys, 
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His Gly, or Ser); and H3 (low hydrophobidty - v,-., ™„, /isp, juys, 
or Aig). For die aUgncd homcodomain sequences, die residues at each 
^^'n?"i,'*'/?c?^* ^ ^ »^y^phobidty by using die scale of Fauchcre 
and Plisto (45). Arg and Lys were not counted unless no odicr residue was 
found at die position, because they contain long aliphanc side chains and can 
thcrc^ subsnmtc for nonpolar residues at some buried sites. To account for 
I^ssiblc sequence errors and rare exceptions, die most hydrophilic residue 
aUowoi at each position was discarded unless it was observed twke. Hie 
second most hydrophiUc residue was dicn chosen to represent die hydropho- 
bicity of each position. An eight-residue window was used and die vectors 
projected racially every 100". The veaor magninidcs were assigned a value of 
I, 0, or -1 for positions where die hydrophobidty group was HI H2 or 
H3, respecovciy. 
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Fig, S. A represcntatcon of one com- 
paa confonnation ibr a particular 
sequence of H and P residues on a 
two-dimensionaJ square latdce. 
(Adapted from (40), wirfi pcrmis- 
sion of die American Chemical Sod- 
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suifece positions. The amphipathic panexns that emerge can be used 
to identify probable regions of secondary structure. Third, incorpo- 
rating a knowledge of allowed substitutions can improve the ability 
to detea and align distantly related proteins because the essential 
residues can be given prominence in the alignment scoring. 

As more sequences are determined, it becomes increasingly likely 
that a protein of interest is a member of a family of related 
sequences. If riiis is not the case, it is now possible to use genetic 
metfiods to generate lists of allovixd amino add subsrinitions. 
Conseqiiendy, at least in die short tenm, it may not be necessary to 
solve the folding problem for individual protein sequences. Instead, 
information from sequence sets could be used. Perhaps by simplify- 
ing sequence space through the identification of key residues, and by 
simplifying conformation space as in the lattice mediods, it Vidll be 
possible to develop algoriduns to goierate a limited number of trial 
strucnires. These trial structures coiild then, in turn, be evaluated by 
further experiments and more sophisticated energy calculations. 
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